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NEW  MODEL  FOR  POPULATION-SUBPOPULATION  DIFFERENCES 


1.  INTRODUCTION 

Civil  defense  planning  requires  estimation  of  casualties  from  the  use  of 
chemical  warfare  agents  against  the  civilian  population.  Computer  models  of 
atmospheric  transport  and  dispersion  can  estimate  the  exposure  of  the  civilian 
population  to  chemical  warfare  agents  from  a  given  scenario.  To  assess  casualties, 
these  models  require  estimates  of  the  toxicity  of  chemical  warfare  agents  to  the  general 
population.  The  available  estimates  of  the  toxicity  of  chemical  warfare  agents  (Grotte 
and  Yang  2001)  are  for  military  personnel.  It  is  widely  believed  that  the  general 
population  is  more  susceptible  to  toxicants  (harmful  substances)  than  the  military 
subpopulation  is  and  that  the  general  population  has  more  variability  in  susceptibility  to 
toxicants  than  the  military  subpopulation  does.  In  the  absence  of  data  relevant  to  the 
soldier-to-civilian  adjustment,  a  subjective  estimate  must  be  used.  A  common  practice 
in  toxicology  is  to  account  for  an  unknown  difference  by  applying  an  uncertainty  factor; 
the  default  uncertainty  factor  for  the  difference  between  a  population  and  a 
subpopulation  is  10 — see,  for  example,  Whalan,  Foureman,  and  Vandenberg  (2006). 
Uncertainty  factors  are  typically  applied  to  a  low  percentile  of  a  distribution  to  estimate  a 
safe  level  of  exposure.  Application  of  uncertainty  factors  to  the  parameters  of  a 
distribution  is  unusual,  but  given  the  lack  of  methods  for  converting  the  parameters  for 
military  personnel  to  parameters  for  the  general  public,  such  an  application  might  be 
made.  Concern  that  estimates  based  on  a  factor  of  10  might  exceed  what  is 
mathematically  possible  led  to  Crosier  and  Sommerville  (2002)  and  Crosier  (2003).  The 
model  used  in  those  reports  has  been  criticized  for  its  distributional  assumptions.  This 
report  compares  the  previous  work  to  a  more  realistic  model  that  was  proposed  by  an 
associate  editor  of  a  journal. 


2.  BACKGROUND  AND  NOTATION 

For  each  individual  there  is  a  dose  that  is  just  sufficient  to  cause  a 
specified  response.  These  just  sufficient  doses  are  called  effective  doses  to  distinguish 
them  from  the  administered  doses  of  a  toxicological  study,  or  the  actual  doses  received 
by  individuals.  In  toxicology,  a  dose  is  an  amount,  such  as  two  pills,  a  teaspoonful,  or 
five  milligrams.  A  dosage  is  an  amount  relative  to  something  else,  such  as  two  pills  per 
day,  a  teaspoonful  with  each  meal,  or  five  milligrams  of  a  substance  per  kilogram  of 
body  mass.  For  exposure  to  toxicants  in  the  atmosphere,  the  dose  (amount  absorbed) 
is  unknown.  The  toxicity  of  inhaled  toxicants  is  characterized  by  the  exposure 
concentration  and  the  exposure  duration,  which  can  be  combined  into  a  single  number 
by  one  of  several  models.  The  distinction  among  dose,  dosage,  and  exposure  is  not 
needed  for  modeling  subpopulations;  henceforth,  the  term  dose  will  be  used  generically 
for  dose,  dosage,  or  exposure.  A  lognormal  distribution  of  effective  doses  is  typically 
used  both  by  toxicologists  for  the  analysis  of  data  and  by  modelers  for  the  prediction  of 
casualties — see,  for  example,  Cornwell  and  Marx  (2006). 
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Toxicologists  characterize  a  lognormal  distribution  of  effective  doses  by  its 
median  effective  dose  (ED50)  and  its  probit  slope.  The  probit  slope  is  the  reciprocal  of 
the  standard  deviation  of  log(effective  dose),  where  log  is  the  common  (base  10) 
logarithm.  Therefore,  the  probit  slope  has  units  of  standard  deviations  per  one  base-10 
logarithm  unit,  or,  equivalently,  standard  deviations  per  a  factor-of-10  change  in  the 
dose.  For  both  the  population  and  a  subpopulation,  these  toxicological  parameters  are 
related  to  the  mean  p  and  standard  deviation  cr  of  a  normal  distribution  by 


pp  =  log(population  ED50)  (1) 

ps  =  log(subpopulation  ED50  (2) 

op  =  1  /  (population  probit  slope  (3) 

cts  =  1  /  (subpopulation  probit  slope)  (4) 


in  which  the  subscripts  p  and  s  represent  population  and  subpopulation,  respectively. 


3.  CRITICISM  OF  THE  ASSUMPTIONS 

The  inconsistency  between  the  assumptions  that  pp  <  ps,  crp  >  cts,  and 
lognormal  distributions  of  effective  dose  for  both  the  population  and  the  subpopulation 
can  be  illustrated  by  a  plot  of  percent  of  individuals  responding  to  a  dose  versus  the 
dose  on  lognormal  probability  paper.  Figure  1  shows  the  lines  for  a  population  with 
pp  =  0,  ctp  =  0.3  (solid  line)  and  a  subpopulation  with  ps  =  0.2,  cts  =  0.1  (dashed  line). 


Dose 


Figure  1.  Probability  Plot  for  a  Population  (solid)  and  a  Subpopulation  (dashed) 
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The  lines  in  Figure  1  cross  at  dose  =  2;  at  any  dose  >  2,  the  fraction  of  the 
military  subpopulation  responding  to  the  dose  will  be  larger  than  the  fraction  of  general 
population  responding  to  the  dose.  This  nonsensical  result  is  a  direct  consequence  of 
the  seemingly  reasonable  assumptions  made  in  the  problem  formulation. 


4.  SUBPOPULATION  MODEL 

This  model  from  Crosier  and  Sommerville  (2002)  and  Crosier  (2003) 
accepts  the  assumptions  of  the  problem  as  given — that  is,  lognormal  distributions  of 
effective  doses  for  both  the  population  and  the  subpopulation — and  checks  whether  the 
values  for  pp,  op,  ps,  and  as  are  mathematically  consistent  with  those  assumptions  and 
with  the  subpopulation  size.  Its  development  is  repeated  here  to  establish  some 
concepts  and  notation  and  for  comparison  to  an  alternative  model  in  the  next  section. 

Figure  2  shows  histograms  of  log(effective  dose)  for  a  population  and  a 
subpopulation  that  is  30%  of  the  population.  The  curves  in  Figure  2  are  not  probability 
densities  but  frequencies — normal  curves  fit  to  histograms — as  described,  for  example, 
in  Dixon  and  Massey  (1969).  Letting  x  =  log(effective  dose)  and  h(x)  =  height  of  the 
normal  curve  fit  to  a  histogram,  the  equations  for  the  normal  curves  for  the  population 
and  subpopulation  are: 

hP  (x)  =  [  Np  w  /  op  (2tt)1/2  ]  exp[-  (x  -  pp)2  /  2  op2  ]  (5) 

hs  (x)  =  [  Ns  w  /  cts  (2tt)1/2  ]  exp[-  (x  -  ps)2  /  2  os2  ]  (6) 

where  Np  is  the  size  of  the  population,  Ns  is  the  size  of  the  subpopulation,  and  w  is  the 
width  of  the  class  intervals,  or  frequency  bins,  used  to  construct  the  histograms. 

In  Figure  2,  the  same  set  of  class  intervals  were  used  to  construct  both  the 
population  histogram  and  the  subpopulation  histogram.  Because  members  of  a 
subpopulation  are  also  members  of  the  population,  the  subpopulation  cannot  have  more 
members  in  a  class  interval,  or  bin,  of  the  histograms  than  the  population  does.  In 
terms  of  the  normal  curves,  the  height  of  the  subpopulation  curve  cannot  exceed  the 
height  of  the  population  curve  at  any  value  of  x.  Consider  two  cases. 

Case  1 :  Supposing  ps  =  jjp,  we  have  hs(x)  <  hp(x)  at  x  =  ps  =  Mp» 
immediately  leading  to  Ns/cts  ^  Np  /ap,  or  (Ns/Np)  ap  <  as — that  is,  the  subpopulation 
standard  deviation  cannot  be  less  than  Ns/Np  times  the  population  standard  deviation. 

It  is  convenient  to  define  the  subpopulation  size  as  a  fraction,  0  =  Ns/Np,  of  the 
population  size. 

Case  2:  Again  supposing  ps=  Mp>  consider  how  large  os  may  be;  crs 
cannot  exceed  ap  because  the  heavier  tail  of  the  subpopulation  curve  would  become 
higher  than  the  tail  of  the  population  curve  at  some  large  value  of  x.  Thus,  the  limits  on 
os  are  established:  0  op  <  os  <  ap. 
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Log  (Effective  Dose) 

Figure  2.  Histogram  of  Log  (Effective  Dose);  Subpopulation  Size  is  30%. 


For  0  <  0  <  1  and  0  ap  <  os  <  ap,  ps  may  vary  over  some  range  without  the 
height  of  the  subpopulation  curve  exceeding  the  height  of  the  population  curve  at  any 
value  of  x.  If  ps  is  equal  to  a  limit  of  its  allowable  range,  the  subpopulation  curve  will 
make  contact  with  the  population  curve  at  the  contact  point.  Figure  3  uses  the  normal 
curves  without  the  histograms  to  illustrate  the  contact  point  for  the  case  with  ps  >  pp. 

The  case  with  ps  <  MP  would  yield  the  mirror  image  of  Figure  3.  Denote  the  x  coordinate 
of  the  contact  point  by  x0.  At  the  contact  point,  the  heights  of  the  two  curves  are  the 
same,  hs(x0)  =  hp(x0).  Also  at  the  contact  point,  the  derivatives  of  the  two  curves  must 
be  the  same,  as  otherwise  the  curves  would  cross.  These  two  conditions  (on  heights 
and  derivatives)  yield  two  equations  that  can  be  solved  to  obtain  an  expression  that 
identifies  the  feasible  combinations  of  the  parameters.  To  simplify  the  derivation,  which 
is  given  in  Crosier  (2003),  it  is  helpful  to  use  the  linear  transformation  zp(x)  =  (  x  -  pp )  / 
op.  For  the  subpopulation,  the  mean  and  standard  deviation  of  zp(x)  are 

5  =  (Ms  -  Mp  )  /  crP  (7) 

and 

E  =  CJS  /  Op  (8) 

respectively,  whereas  for  the  population,  zp(x)  has  mean  zero  and  variance  one.  The 
derivation  yields 
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5  =  ±[2(£2-1)  LN(9/£)  ] 1/2  (9) 

where  LN  is  the  natural  (base  e)  logarithm.  Equation  (9)  gives  the  limits  of  the  feasible 
range  for  5  as  a  function  of  0  and  £.  The  ranges  of  0  and  £  are  0  <  0  <  £  <  1 . 


Log  (Effective  Dose) 

Figure  3.  Contact  Point  of  Population  and  Subpopulation  Curves 


In  toxicological  applications,  a  subpopulation  with  5  >  0  is  called  a 
resistant  subpopulation  and  a  subpopulation  with  5  <  0  is  called  a  sensitive 
subpopulation.  The  term  feasible  region  will  be  limited  to  either  the  resistant 
subpopulation  case  or  the  sensitive  subpopulation  case.  It  is  only  necessary  to  study 
one  case;  the  results  apply  to  the  other  case  by  symmetry.  Figure  4  shows  the  feasible 
region  (shaded)  for  a  resistant  subpopulation  of  size  0  =  .3.  The  feasible  region 
appears  to  be  a  semi-ellipse,  but  it  is  slightly  asymmetric  in  the  left-right  direction;  this 
asymmetry  is  more  pronounced  for  smaller  values  of  0.  The  line  drawn  from  the  origin 
tangent  to  the  feasible  region  in  Figure  4  touches  the  feasible  region  at  the  point  where 
the  ratio  5/e  is  maximized.  This  combination  of  5  and  £,  which  is  marked  by  a  diamond 
in  Figure  4,  yields  the  minimum  value  for  pp  because,  from  (7)  and  (8), 

pp  =  ps  -  (6/e)  Os  (10) 

Equation  (10)  allows  calculation  of  the  median  effective  dose  for  the  general  population, 
which  is  antilog(pp),  from  the  known  quantities  ps,  os,  and  parameters  (5  and  e)  that 
describe  the  relationship  between  the  population  and  the  subpopulation. 
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Figure  4.  Feasible  Region  of  5  and  £  for  a  Subpopulation  of  Size  8  =  .3 


The  values  of  £  and  5  that  maximize  the  ratio  5/e  may  be  found 
analytically.  The  ratio  5/e  can  be  at  its  maximum  only  if  5  is  at  its  maximum  for  the 
given  value  of  e.  Therefore,  5  in  the  ratio  5/e  can  be  replaced  by  the  right-hand  side  of 
(9).  Monotonic  transformations  are  often  used  to  simplify  the  process  of  finding 
maxima;  here,  squaring  works  well.  Taking  the  derivative  of  (5/e)2  with  respect  to  e, 
setting  the  derivative  to  zero,  and  solving  for  e  yields 

Er  =  0  exp[(1  -  £r2)/2]  (11) 

Setting  the  derivative  to  zero  fixes  the  value  of  e,  so  it  is  denoted  £r  to  indicate  that  it  is 
the  value  of  e  that  maximizes  the  ratio  5/e.  For  fixed  0,  (1 1)  can  be  solved  by  a 
numerical  procedure,  then  (9)  can  be  used  to  obtain  5r,  the  value  of  5  that  corresponds 
to  £r  and  hence  maximizes  the  ratio  5/e,  from  £r. 

The  application  is  a  subpopulation-to-population  problem — given  0,  ps, 
and  os,  how  large  can  |ps  -  pp|  be?  The  solution  involves  maximizing  the  ratio  5/e  for 
fixed  0.  A  population-to-subpopulation  problem — given  0,  pp,  and  ap,  how  large  can 
lbs  -  MpI  be? — requires  finding  the  maximum  value  of  6  for  fixed  0  (the  triangle  in  Figure 
4).  The  maximum  value  of  6,  6X,  and  the  value  of  e  at  which  it  occurs,  ex,  can  be  found 
analytically  by  a  procedure  similar  to  the  procedure  used  to  find  6r  and  £r. 
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5. 


COVARIATE  MODEL 


The  covariate  model,  which  was  proposed  by  an  anonymous  associate 
editor  of  The  American  Statistician,  describes  the  physical  process  of  generating  a 
subpopulation.  Military  personnel  are  selected  on  the  basis  of  characteristics  that  may 
be  correlated  with  effective  dose.  Let  X  be  the  random  variable  of  interest — here, 
log(effective  dose) — and  Y  be  the  random  variable  by  which  the  selection  of 
subpopulation  members  is  made.  For  convenience,  assume  that  both  X  and  Y  are 
standardized  so  that  the  population  has  mean  zero  and  variance  one  for  both  X  and  Y. 
If  the  subpopulation  consists  of  all  individuals  for  which  a  <  Y  <  b,  then  the  probability 
density  function  (pdf)  of  X  for  the  subpopulation  is 


JabfXY(x,y)dy 

ja\(y)dy 


(12) 


where  fx,v(x,y)  is  the  joint  probability  density  function  of  X  and  Y  for  the  population 
and  fY(y)  is  the  marginal  probability  density  function  of  Y  for  the  population.  Note  that 
the  integral  in  the  denominator  of  (12)  yields  the  value  of  0.  If  a  <  b  and  fx,Y(x,y)  is  a 
bivariate  normal  distribution  with  correlation  coefficient  p  >  0,  then  fXs(x)  is  not  a  normal 
distribution.  Therefore,  the  subpopulation  model,  which  assumes  that  the  subpopulation 
has  a  normal  distribution,  cannot  be  correct  for  the  application.  It  can,  however,  be 
a  useful  approximation  if  the  distribution  of  the  subpopulation  is  close  to  normal. 

The  pdf  fXs(x)  can  be  obtained  by  numerical  integration  and  compared  to  a  normal 
distribution  with  the  same  mean  and  variance  as  fXs(x)  [the  subpopulation  mean  5  and 
variance  z2  are  obtained  numerically  from  fxs(x)].  I  compared  fXs(x)  of  the  covariate 
model  to  its  normal  approximation  by  using  the  cumulative  distribution  function  (CDF), 
Fxs(x).  The  CDF  FXs(x)  gives  the  fraction  of  individuals  in  the  subpopulation  responding 
to  a  dose  x.  Let  C  be  the  maximum  value  of  |Fxs(x)  -  0((x  -  5)/s)|  for  any  x,  where  O  is 
the  standard  normal  CDF.  Numerical  calculations  show  that,  for  0  >  .001,  the  criterion 
C  does  not  exceed  .01  if  |p|  <  .69. 

To  make  a  figure  showing  the  feasible  region  of  5  and  z  for  the  covariate 
model,  it  is  necessary  to  fix  0  and  compute  6  and  z  numerically  for  various  values  of  p, 
a,  and  b.  Attention  to  several  special  cases  may  enhance  understanding  of  the 
covariate  model.  First,  if  p  =  0,  the  selection  by  Y  is  irrelevant;  the  subpopulation  has  a 
normal  distribution  with  the  same  parameter  values  as  the  population.  Therefore,  the 
case  p  =  0  is  represented  by  the  single  point  6  =  0  and  £  =  1 .  Second,  if  p  =  1 ,  X  has 
the  same  distribution  as  Y,  which,  for  the  subpopulation,  is  a  truncated  normal 
distribution  with  a  truncating  a  fraction  qi  and  b  truncating  a  fraction  q2.  Johnson  and 
Kotz  (1970)  give  formulas  for  the  mean  and  variance  of  truncated  normal  distributions. 
Third,  if  qi  =  q2,  then  the  distribution  of  X  for  the  subpopulation  is  symmetrical,  6=0, 
and  the  value  of  £  depends  on  the  two  parameters  p  and  0  =  1  —  qi  —  q2.  Fourth,  if 
either  qi  =  0  or  q2  =  0 — indicating  one-sided  or  single  truncation — the  results  can  again 
be  expressed  in  terms  of  the  two  parameters  p  and  0.  If  Y  represents  the  health  status 
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of  individuals,  and  the  military  does  not  reject  anyone  for  being  too  healthy,  then  q2  =  0 
is  reasonable. 

For  the  general  case  of  single  or  double  truncation,  Figure  5  shows  the 
feasible  region  of  £  and  5  for  0  =.3  from  the  covariate  model  (dashed  curves  and  dash- 
dot  line).  The  apex  of  the  feasible  region  from  the  covariate  model,  which  is  marked  by 
a  square  in  Figure  5,  represents  the  point  p  =  1  and  q2  =  0.  The  parameter  combination 
p  =  1  and  q2  =  0  puts  the  subpopulation  into  the  upper  tail  of  the  population  and 
therefore  was  called  the  tail  model  by  Crosier  and  Sommerville  (2002).  The  left  side — 
which  has  long  dashes — represents  p  =  1 ;  q2  varies  from  zero  at  the  top  to  (1  -  0)/2  = 
.35  at  the  bottom.  The  right  side — which  has  short  dashes — represents  q2  =  0;  p  varies 
from  one  at  the  top  to  zero  at  the  bottom.  For  single  truncation  (q2  =  0),  the  feasible 
region  consists  only  of  this  curve  of  short  dashes.  The  line  at  the  base  of  the  feasible 
region — which  has  a  dash-dot  pattern — represents  symmetrical  truncation  (qi  =  q2);  p 
varies  from  one  at  the  left  end  to  zero  at  the  right  end.  The  upper  boundary  of  the 
feasible  region  from  the  subpopulation  model  is  outlined  in  gray,  and  its  maximum-ratio 
and  maximum-mean  points  are  again  marked  with  a  diamond  and  triangle,  respectively. 
The  circle  in  Figure  5  marks  the  intersection  of  the  upper  boundary  of  the  feasible 
region  of  the  subpopulation  model  with  the  feasible  region  of  the  covariate  model  for 
single  truncation.  This  point  is  the  worst-case  parameter  combination  for  a 
subpopulation  created  by  single  truncation  on  a  covariate,  given  the  requirement  for 
approximately  normal  distributions  for  the  population  and  the  subpopulation.  The 
centroid  from  the  subpopulation  model  (see  Crosier  2003)  is  marked  by  an  asterisk  in 
Figure  5.  The  feasible  region  of  the  subpopulation  model  indicates  approximately  the 
region  where  the  distribution  of  a  subpopulation  obtained  by  truncation  on  a  covariate 
can  be  adequately  represented  by  a  normal  distribution. 

Table  1  gives  the  standardized  parameters  Sand  £  for  resistant 
subpopulations  and  the  goodness-of-fit  criterion  C  for  the  normal  approximation  of  the 
subpopulation  distribution.  The  worst-case  parameter  sets  (in  terms  of  maximizing  |pp  - 
ps| )  are  given  for  (1)  double  truncation,  population-to-subpopulation  conversion,  (2) 
double  truncation,  subpopulation-to-population  conversion,  and  (3)  single  truncation. 

For  single  truncation,  there  is  no  difference  between  maximizing  5  and  maximizing  the 
ratio  5/s;  hence,  there  is  only  one  worst-case  set  of  parameters  for  both  population-to- 
subpopulation  and  subpopulation-to-population  conversions.  In  Figure  5,  the  three 
cases  of  Table  1  are  marked  by  a  triangle,  a  diamond,  and  a  circle,  respectively. 

The  values  of  0,  5,  and  £  in  Table  1  should  satisfy  (9). 
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Subpopulation  Standard  Deviation  (s) 


Figure  5.  Comparison  of  Feasible  Regions  from  Covariate 
and  Subpopulation  Models 


Table  1.  Standardized  Parameters  and  the  Goodness-of-Fit  Criterion  for  Resistant 
Subpopulations.  For  sensitive  subpopulations,  multiply  the  means  by  -1 . 


Double  Truncation  Double  Truncation 

Maximum  Mean  Maximum  Ratio  Single  Truncation 


Size 

Mean 

StDev 

Fit 

Mean 

StDev 

Fit 

Mean 

StDev 

Fit 

0 

5X 

£x 

C 

8r 

£  r 

C 

St 

£t 

c 

.  05 

1 . 873 

.434 

.  017 

.  993 

.082 

.  009 

1 .807 

.576 

.016 

.10 

1 . 554 

.489 

.  017 

.  974 

.163 

.  012 

1.490 

.630 

.015 

.  15 

1.347 

.  532 

.  016 

.  942 

.240 

.  013 

1.287 

.667 

.  014 

.20 

1.189 

.569 

.  015 

.  901 

.314 

.  014 

1 . 134 

.696 

.013 

.25 

1 . 059 

.602 

.015 

.853 

.383 

.  014 

1.008 

.722 

.013 

.30 

.  946 

.633 

.  014 

.800 

.447 

.  014 

.900 

.  744 

.012 

.35 

.  846 

.663 

.  014 

.743 

.  507 

.  014 

.  805 

.  766 

.011 

.40 

.  756 

.691 

.  013 

.683 

.  563 

.  014 

.719 

.  785 

.011 

.45 

.  673 

.  719 

.  012 

.  623 

.614 

.  013 

.640 

.  804 

.010 

.50 

.596 

.  746 

.012 

.562 

.662 

.  012 

.568 

.  822 

.010 

.60 

.455 

.798 

.  010 

.441 

.  748 

.  Oil 

.435 

.  857 

.008 

.  75 

.269 

.  875 

.  007 

.266 

.  857 

.008 

.259 

.  908 

.006 

15 


In  terms  of  the  application,  the  complementary  subgroup  consists  of  all 
persons  who  are  unfit  for  military  duty.  The  removal  of  all  healthy,  young  adults  from  a 
population  creates  an  unnatural  population  in  the  biological  sense;  it  is  not  surprising 
that  this  unnatural  population  does  not  have  a  normal  distribution. 

The  covariate  model  only  allows  for  selection  by  specification  of  an 
acceptable  range  on  the  covariate.  More  general  selection  processes  can  be  modeled 
as  follows.  Let  PY(y)  =  1  for  a  <  Y  <  b  and  PY(y)  =  0  otherwise.  Then  (12)  can  be 
written  as 

r,pv(y)fx.Y(*.y)dy 

|lpv  (y)  fv  (y)dy 


where  the  integrations  extend  from  -°°  to  00  mathematically,  but,  for  numerical 
integration,  over  an  interval  sufficient  to  include  nearly  all  of  the  distribution.  A  process 
for  selecting  a  subpopulation  that  is  more  complicated  than  an  acceptable  range  on  a 
covariate  can  be  incorporated  into  the  covariate  model  by  letting  PY(y)  take  on  values 
other  than  0  and  1 ,  subject  to  0  <  PY(y)  <  1  for  all  y.  Note  that  j  PY(y)  fY(y)  dy  =  9.  I  call 
PY(y)  a  selection  function.  A  selection  function  can  also  be  applied  to,  or  defined  for, 
the  variable  of  interest  by  using  the  heights  of  the  histograms  in  Figure  2:  let  Px(x)  = 
hs(x)/hp(x). 


6.  COMPARISON  OF  METHODS 

From  (10),  the  estimate  of  the  ED50  for  the  general  population  depends  on 
1/Os,  the  probit  slope  of  the  toxicant  for  the  military  subpopulation.  The  probit  slopes 
listed  in  Grotte  and  Yang  (2001)  range  from  3  to  12.  Therefore,  to  compare  the 
methods  of  making  estimates,  I  use  fictitious  toxicants  with  an  ED50  of  100  and  probit 
slopes  of  3,  6,  and  12  for  the  military  subpopulation.  Table  2  gives  the  ED50  and  the 
probit  slope  for  the  general  population  by  three  methods.  The  probit  slope  for  the 
general  population  is  simply  z  times  the  probit  slope  for  the  military  subpopulation. 

Applying  a  factor  of  10  to  the  parameters  used  in  toxicology,  the  ED50  and 
the  probit  slope,  yields  the  estimates  pp  =  ps  -  1  and  op  =  10  os.  The  uncertainty  factor 
method  does  not  explicitly  depend  on  either  the  subpopulation  size  or  the  subpopulation 
probit  slope. 
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Table  2.  Median  Effective  Dose  and  Probit  Slope  for  the  General  Population 


Military  Subpopulation 

Type 

of  Estimate  for 

the  General  Population 

Size 

Probit 

Slope 

Uncertainty 

Factor 

Double 

Truncation 

Single 

Truncation 

0 

ED50 

1/gs 

ED50 

1/Op 

ED50 

1/Op 

1/Op 

.  2 

100 

3 

10 

.3 

11 

0.9 

29 

2.1 

.  2 

100 

6 

10 

.  6 

33 

1 . 9 

54 

4.2 

.  2 

100 

12 

10 

1.2 

57 

3.7 

73 

8.4 

.  3 

100 

3 

10 

.3 

26 

1.4 

39 

2.2 

.3 

100 

6 

10 

.6 

51 

2.7 

63 

4.4 

.3 

100 

12 

10 

1.2 

71 

5.4 

79 

8 . 9 

.  4 

100 

3 

10 

.  3 

39 

1 . 7 

50 

2.4 

.4 

100 

6 

10 

.  6 

63 

3.4 

70 

4 . 7 

.4 

100 

12 

10 

1.2 

79 

6.7 

84 

9.5 

The  estimates  for  double  truncation  are  based  on  the  maximum-ratio 
values  5r  and  sr  because  the  conversion  is  subpopulation  to  population.  The  basis  for 
the  use  of  double  truncation  is  that  age  is  the  covariate,  and  young  adults  correspond  to 
some  age  range,  say,  18  to  35  years.  The  straight-line  regression  of  log(effective  dose) 
on  age — as  implied  by  the  bivariate  normal  distribution — results  in  children  being  more 
resistant  to  toxicants  than  young  adults  are.  Such  an  assumption  is  not  acceptable  for 
risk  assessment.  The  basis  for  the  use  of  single  truncation  is  that  health  status  is  the 
covariate,  and  there  is  no  segment  of  the  population  healthier  than  young  adults.  Single 
truncation  results  in  less  extreme  conversions  than  double  truncation.  This  result  may 
seem  backwards  because  single  truncation  puts  the  subpopulation  into  the  tail  of  the 
distribution  of  the  covariate.  Single  and  double  truncation  are  not  compared  on  the 
same  basis  because  the  correlation  coefficient  p  is  not  held  constant.  To  obtain  an 
approximate  normal  distribution  for  the  variable  of  interest,  single  truncation  requires  a 
lower  value  of  |pj  than  double  truncation  does.  The  lower  value  of  |p|  results  in  less 
difference  between  the  subpopulation  and  the  population.  In  the  application,  the  ED50 
and  the  probit  slope  for  both  the  population  and  the  subpopulation  are  used  as  if  they 
apply  to  a  lognormal  distribution,  so  to  compare  single  and  double  truncation  it  seems 
better  to  use  approximate  normal  distributions  of  log(effective  dose)  for  both  types  of 
truncation  than  to  use  the  same  value  of  p  for  both  types  of  truncation. 


7.  CONCLUSION 

As  shown  in  Figure  5,  most  combinations  of  parameters  indicated  as 
feasible  by  the  subpopulation  model  are  also  indicated  as  feasible  by  the  covariate 
model.  At  the  combinations  of  parameters  indicated  as  feasible  by  the  subpopulation 
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model,  the  discrepancy  between  the  actual  distribution  of  the  subpopulation,  as 
obtained  from  the  covariate  model,  and  the  normal  distribution  assumed  by  the 
subpopulation  model  is  small,  as  indicated  by  the  values  of  the  criterion  C  in  Table  1. 
More  extreme  worst-case  scenarios  can  be  obtained  from  the  covariate  model  than 
from  the  subpopulation  model,  but  only  by  allowing  the  subpopulation  to  have  a  very 
non-normal  distribution. 

The  covariate  model  distinguishes  between  single  and  double  truncation. 
The  combination  of  parameters  suggested  as  worst-case  values  by  Crosier  (2003)  can 
be  obtained  from  the  covariate  model  only  by  double  truncation  on  the  covariate. 
However,  no  covariate  for  which  double  truncation  is  appropriate  has  been  suggested. 
Age  cannot  be  the  covariate  because  it  cannot  have  a  bivariate  normal  distribution  with 
the  logarithm  of  effective  dose — sensitive  individuals  are  at  both  ends  of  the  age  range. 
If  health  status  is  the  covariate,  then  only  single  truncation  is  a  plausible  procedure  for 
selecting  military  personnel  from  the  general  population.  Single  truncation  on  the 
covariate  limits  the  possible  parameter  combinations  to  the  right-side  boundary  of 
the  feasible  region  for  the  covariate  model  in  Figure  5.  Hence,  the  combination  of 
parameter  values  previously  suggested  as  the  worst-case  (the  diamond  in  Figure  5) 
are  not  realistic.  Similarly,  the  parameter  combinations  denoted  centroid  values  (the 
asterisk  in  Figure  5)  by  Crosier  (2003)  are  also  not  realistic  because  they  require  double 
truncation  on  the  covariate.  Another  problem  with  centroid  estimates  is  that  they  are 
dependent  on  the  scale  over  which  the  averaging  is  done.  For  example,  they  depend 
on  whether  the  standardization  is  by  the  population  parameters  [  5  =  (p8  -  pp  )  /  op  and 
e  =  os  /  op  ]  or  by  the  subpopulation  parameters  [  n  —  (Mp  ~  Ms)  /  crs  and  ijj  =  ap  /  cts  ]. 
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Disclaimer 


The  findings  in  this  report  are  not  to  be  construed  as  an  official  Department  of  the  Army 
position  unless  so  designated  by  other  authorizing  documents. 


